AITopics | locality-sensitive hashing

Locality-Sensitive Hashing for f-Divergences: Mutual Information Loss and Beyond

Neural Information Processing SystemsDec-25-2025, 03:11:41 GMT

Computing approximate nearest neighbors in high dimensional spaces is a central problem in large-scale data mining with a wide range of applications in machine learning and data science. A popular and effective technique in computing nearest neighbors approximately is the locality-sensitive hashing (LSH) scheme. In this paper, we aim to develop LSH schemes for distance functions that measure the distance between two probability distributions, particularly for f-divergences as well as a generalization to capture mutual information loss. First, we provide a general framework to design LHS schemes for f-divergence distance functions and develop LSH schemes for the generalized Jensen-Shannon divergence and triangular discrimination in this framework. We show a two-sided approximation result for approximation of the generalized Jensen-Shannon divergence by the Hellinger distance, which may be of independent interest. Next, we show a general method of reducing the problem of designing an LSH scheme for a Krein kernel (which can be expressed as the difference of two positive definite kernels) to the problem of maximum inner product search.

locality-sensitive hashing, mutual information loss, name change, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing

Neural Information Processing SystemsMay-27-2025, 03:12:58 GMT

Larger transformer models perform better on various downstream tasks but require more cost to scale up the model size. To efficiently enlarge models, the Mixture-of-Expert (MoE) architecture is widely adopted, which consists of a gate network and a series of experts and keep the training cost constant by routing the input data to a fixed number of experts instead of all.In existing large-scale MoE training systems, experts would be distributed among different GPUs for parallelization, and thus input data requires additional all-to-all communication to access the target expert and conduct corresponding computation. However, upon evaluating the training process of three mainstream MoE models on commonly used GPU clusters, we found that the all-to-all communication ratio averaged around 45\%, which significantly hinders the training efficiency and scalability of MoE models.In this paper, we propose LSH-MoE, a communication-efficient MoE training framework using locality-sensitive hashing (LSH). We first present the problems of scaling MoE training in existing systems and highlight the potential of exploiting token similarity to facilitate data compression.Then, we introduce an efficient LSH-based compression technique, which utilizes the cross-polytope hashing for rapid clustering and implements a residual-based error compensation scheme to alleviate the adverse impact of compression. To verify the effectiveness of our methods, we conduct experiments on both language models (e.g., RoBERTa, GPT, and T5) and vision models (e.g., Swin) for both pre-training and fine-tuning tasks.

artificial intelligence, communication-efficient moe training, machine learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

Almost Linear Time Consistent Mode Estimation and Quick Shift Clustering

Hashemian, Sajjad

arXiv.org Machine LearningMar-10-2025

In this paper, we propose a method for density-based clustering in high-dimensional spaces that combines Locality-Sensitive Hashing (LSH) with the Quick Shift algorithm. The Quick Shift algorithm, known for its hierarchical clustering capabilities, is extended by integrating approximate Kernel Density Estimation (KDE) using LSH to provide efficient density estimates. The proposed approach achieves almost linear time complexity while preserving the consistency of density-based clustering.

algorithm, estimation, mode estimation, (15 more...)

arXiv.org Machine Learning

2503.07995

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Super-Bit Locality-Sensitive Hashing

Jianqiu Ji, Jianmin Li, Shuicheng Yan, Bo Zhang, Qi Tian

Neural Information Processing SystemsFeb-11-2025, 18:15:04 GMT

Sign-random-projection locality-sensitive hashing (SRP-LSH) is a probabilistic dimension reduction method which provides an unbiased estimate of angular similarity, yet suffers from the large variance of its estimation. In this work, we propose the Super-Bit locality-sensitive hashing (SBLSH). It is easy to implement, which orthogonalizes the random projection vectors in batches, and it is theoretically guaranteed that SBLSH also provides an unbiased estimate of angular similarity, yet with a smaller variance when the angle to estimate is within (0, /2]. The extensive experiments on real data well validate that given the same length of binary code, SBLSH may achieve significant mean squared error reduction in estimating pairwise angular similarity. Moreover, SBLSH shows the superiority over SRP-LSH in approximate nearest neighbor (ANN) retrieval experiments.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.14)
Asia > China (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.34)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.30)

Add feedback

Reviews: Locality-Sensitive Hashing for f-Divergences: Mutual Information Loss and Beyond

Neural Information Processing SystemsJan-22-2025, 06:46:24 GMT

The paper presents locality-sensitive hashing schemes for well-studied distance function between probability distributions. The new schemes are based on the ideas. The first one is to approximate the distance function of interest by another distance function for which LSH schemes are known. In particular, the paper shows how to approximate MIL divergence and triangular discrimination by the Hellinger distance, for which LSH schemes are known. The second is specific to the MIL divergence, and involves representing the latter distance function as a so-called Krein kernel, and designing an asymmetric LSH scheme.

distance function, locality-sensitive hashing, probability distribution, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.37)

Add feedback

Locality-Sensitive Hashing for f-Divergences: Mutual Information Loss and Beyond

Neural Information Processing SystemsOct-9-2024, 16:28:27 GMT

Computing approximate nearest neighbors in high dimensional spaces is a central problem in large-scale data mining with a wide range of applications in machine learning and data science. A popular and effective technique in computing nearest neighbors approximately is the locality-sensitive hashing (LSH) scheme. In this paper, we aim to develop LSH schemes for distance functions that measure the distance between two probability distributions, particularly for f-divergences as well as a generalization to capture mutual information loss. First, we provide a general framework to design LHS schemes for f-divergence distance functions and develop LSH schemes for the generalized Jensen-Shannon divergence and triangular discrimination in this framework. We show a two-sided approximation result for approximation of the generalized Jensen-Shannon divergence by the Hellinger distance, which may be of independent interest.

generalized jensen-shannon divergence, locality-sensitive hashing, mutual information loss, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

Inexact Simplification of Symbolic Regression Expressions with Locality-sensitive Hashing

Aldeia, Guilherme Seidyo Imai, de Franca, Fabricio Olivetti, La Cava, William G.

arXiv.org Artificial IntelligenceApr-8-2024

Symbolic regression (SR) searches for parametric models that accurately fit a dataset, prioritizing simplicity and interpretability. Despite this secondary objective, studies point out that the models are often overly complex due to redundant operations, introns, and bloat that arise during the iterative process, and can hinder the search with repeated exploration of bloated segments. Applying a fast heuristic algebraic simplification may not fully simplify the expression and exact methods can be infeasible depending on size or complexity of the expressions. We propose a novel agnostic simplification and bloat control for SR employing an efficient memoization with locality-sensitive hashing (LHS). The idea is that expressions and their sub-expressions traversed during the iterative simplification process are stored in a dictionary using LHS, enabling efficient retrieval of similar structures. We iterate through the expression, replacing subtrees with others of same hash if they result in a smaller expression. Empirical results shows that applying this simplification during evolution performs equal or better than without simplification in minimization of error, significantly reducing the number of nonlinear functions. This technique can learn simplification rules that work in general or for a specific problem, and improves convergence while reducing model complexity.

complexity, expression, simplification, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3638529.3654147

2404.05898

Country:

South America > Brazil > São Paulo (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(5 more...)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine (1.00)
Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing

Meiner, Lukas, Mehnert, Jens, Condurache, Alexandru Paul

arXiv.org Artificial IntelligenceSep-29-2023

To reduce the computational cost of convolutional neural networks (CNNs) for usage on resource-constrained devices, structured pruning approaches have shown promising results, drastically reducing floating-point operations (FLOPs) without substantial drops in accuracy. However, most recent methods require fine-tuning or specific training procedures to achieve a reasonable trade-off between retained accuracy and reduction in FLOPs. This introduces additional cost in the form of computational overhead and requires training data to be available. To this end, we propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module. It instantly reduces the network's test-time inference cost without requiring any training or fine-tuning. We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH) to detect redundancies in the channel dimension. Similar channels are aggregated to reduce the input and filter depth simultaneously, allowing for cheaper convolutions. We demonstrate our approach on the popular vision benchmarks CIFAR-10 and ImageNet. In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.

cnn, instant complexity reduction, locality-sensitive hashing

arXiv.org Artificial Intelligence

2309.17211

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)

Add feedback

Locality-Sensitive Hashing for f-Divergences: Mutual Information Loss and Beyond

Chen, Lin, Esfandiari, Hossein, Fu, Gang, Mirrokni, Vahab

Neural Information Processing SystemsMar-19-2020, 00:46:48 GMT

Computing approximate nearest neighbors in high dimensional spaces is a central problem in large-scale data mining with a wide range of applications in machine learning and data science. A popular and effective technique in computing nearest neighbors approximately is the locality-sensitive hashing (LSH) scheme. In this paper, we aim to develop LSH schemes for distance functions that measure the distance between two probability distributions, particularly for f-divergences as well as a generalization to capture mutual information loss. First, we provide a general framework to design LHS schemes for f-divergence distance functions and develop LSH schemes for the generalized Jensen-Shannon divergence and triangular discrimination in this framework. We show a two-sided approximation result for approximation of the generalized Jensen-Shannon divergence by the Hellinger distance, which may be of independent interest.

generalized jensen-shannon divergence, locality-sensitive hashing, mutual information loss, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback

Super-Bit Locality-Sensitive Hashing

Ji, Jianqiu, Li, Jianmin, Yan, Shuicheng, Zhang, Bo, Tian, Qi

Neural Information Processing SystemsDec-31-2012

Sign-random-projection locality-sensitive hashing (SRP-LSH) is a probabilistic dimension reduction method which provides an unbiased estimate of angular similarity, yet suffers from the large variance of its estimation. In this work, we propose the Super-Bit locality-sensitive hashing (SBLSH). It is easy to implement, which orthogonalizes the random projection vectors in batches, and it is theoretically guaranteed that SBLSH also provides an unbiased estimate of angular similarity, yet with a smaller variance when the angle to estimate is within $(0,\pi/2]$. The extensive experiments on real data well validate that given the same length of binary code, SBLSH may achieve significant mean squared error reduction in estimating pairwise angular similarity. Moreover, SBLSH shows the superiority over SRP-LSH in approximate nearest neighbor (ANN) retrieval experiments.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.14)
Asia > China (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.34)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.30)

Add feedback

Filters

Collaborating Authors

locality-sensitive hashing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Locality-Sensitive Hashing for f-Divergences: Mutual Information Loss and Beyond

LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing

Almost Linear Time Consistent Mode Estimation and Quick Shift Clustering

Super-Bit Locality-Sensitive Hashing

Reviews: Locality-Sensitive Hashing for f-Divergences: Mutual Information Loss and Beyond

Locality-Sensitive Hashing for f-Divergences: Mutual Information Loss and Beyond

Inexact Simplification of Symbolic Regression Expressions with Locality-sensitive Hashing

Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing

Locality-Sensitive Hashing for f-Divergences: Mutual Information Loss and Beyond

Super-Bit Locality-Sensitive Hashing